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Abstract 



We introduce two efficient algorithms for computing the partial Fourier transforms in 
one and two dimensions. Our study is motivated by the wave extrapolation procedure in 
reflection seismology. In both algorithms, the main idea is to decompose the summation 
domain of into simpler components in a multiscale way. Existing fast algorithms are 
then applied to each component to obtain optimal complexity. The algorithm in ID is 
exact and takes 0(N log 2 N) steps. Our solution in 2D is an approximate but accurate 
algorithm that takes 0(N 2 log 2 N) steps. In both cases, the complexities are almost 
linear in terms of the degree of freedom. We provide numerical results on several test 
examples. 

Keywords. Fast Fourier transform; Multiscale decomposition; Butterfly algorithm; 
Fractional Fourier transform; Wave extrapolation. 
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1 Introduction 

In this paper, we introduce efficient algorithms for the following partial Fourier transform 
problem in one and two dimensions. In ID, let N be a large integer and c°(t) be a smooth 
function on [0,1] with < c°(t) < 1. We define a sequence of integers {c x ,0 < x < N} 
by c x = N\c°(x/N)~\ . Given a sequence of N numbers {/&, < k < N}, the problem is to 
compute {u x , < x < N} where 



where we use i for \J— 1 throughout this paper. In 2D, the problem is defined in a similar 

way. Now let c (t) be a smooth function on [0, l] 2 with < c°(t) < 1. The array {c x , < 
xi,X2 < N} where x = (x%,X2) is defined by c x = N\c°(x/N)~\. Given an array of N 2 
numbers {/*,, < k±, < N} where k = (k±, £2)1 we want to compute {u x , < xi, xi < N} 
defined by 



\k\<c x 

Due to the existence of the summation constraints on k in ([!]) and ^ , the fast Fourier 
transform cannot be used directly here. Direct computation of (IT]) and (I2J) has quadratic 




(1) 



k<c x 





(2) 



complexity; i.e., 0(N 2 ) operations for ([I]) and 0(N 4: ) for This can be expensive for 
large values of N, especially in 2D. In this paper, we propose efficient solutions which have 
almost linear complexity. Our algorithm for the ID case is exact and takes 0(N log 2 N) 
steps, while our solution to the 2D problem is an accurate approximate algorithm with an 
0(N 2 log 2 N) complexity. We define the summation domain D to be the set of all pairs (x, k) 
appeared in the summation, namely D = {(x, k),k < c x } in ID and D = {(x, k),\k\ < c x } 
in 2D. The main idea behind both algorithms is to partition the summation domain into 
simple components in a multiscale fashion. Fast algorithms are then invoked on each simple 
component to achieve maximum efficiency 

The partial Fourier transform appears naturally in several settings. The one which 
motivates our research is the one-way wave extrapolation method in seismology jl] , where 
one often needs to compute an approximation to the following integral [8]: 

u z (x) = [ e 2n < x - k+ ^ 2/v2ix) ~ k2 - z) u^(k)dk, (3) 



where u and z are fixed constants (frequency and extrapolation depth), v{x) is a given 
function (layer velocity), and uo{k) is the Fourier transform of a function uq{x). The 
wave modes that correspond to < uj/v{x) are propagating waves, while the ones that 
correspond to \k\ > w/p(x) are called evanescent. For the purposes of seismic imaging, 
one is often interested in only the propagating mode and, therefore, we have the following 
restricted integral to evaluate: 

J\k\<ui/v(x) 

To make the computation efficient, the term with the square root under is often approxi- 
mated with a functional form 

e 2 ™V" 2 /* 2 (x)-K 2 -* „ f n (x) Mk) (4) 

n 

with a limited number of terms. The integral then reduces to 

£/n(*) 

n 

where Wn{k) = tp n (k)uo(k). The kernel of this approximation involves computing 

e 2mx - k Wn{k)dk, 

\k\<w / p(x) 

which takes the forms of ([!]) and ^ after discretization. 

The rest of this paper is organized as follows. In Section[2j we describe our ID algorithm. 
The 2D algorithm is presented in Section [3} Finally, we conclude with some discussions for 
future work in Section |4j Throughout this paper, we assume for simplicity that iV is an 
integer power of 2. 



e 2mx Wn{k)dk. 

<jj/v(x) 
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2 Partial Fourier Transform in ID 



2.1 Algorithm description 

We define the summation domain by D = {(x, k), k < c x }. The idea behind our algorithm 
for computing 

u * = E e 2mxk/N fk, < x < N 

k<c x 

is to decompose D in a multiscale fashion. Starting from the top level box [0,iV] 2 , we 
partition the domain recursively. If a box B is fully inside D, it is not subdivided and we 
keep it inside the decomposition. If a box B is fully outside D, it is discarded. Finally, if 
a box B has parts that belong to both D and [0,N] 2 \D, it is further subdivided into four 
child boxes with equal size. At the end of this process, our decomposition contains a group 
of boxes with dyadic sizes and the union of these boxes is exactly equal to D (see Figure 




Figure 1: Left: the curve stands for {c x } and the summation domain D is the region below 
the curve. Right: the multiscale decomposition constructed for D. The boxes increase their 
size geometrically as they move away from the curve. 

Let us consider a single box B of the decomposition. Suppose that B is of size s and 
that its lower-left corner of B is (x B , k B ). The part of the summation that associated with 
B is 

£ e 2 ^ N f k (5) 

k B <k<k B +s 

for each x B < x < x B + s. Denoting x = x B + x' and k = k B + k', we can write this into a 
matrix form Mf with 



2m(x B +x')(k B +&') /N _ 2m(x B +x')k B /N . 2mx'k'/N , 2mx B (k B +k')/N 



Noticing that the first and the third terms depend only on x' and k', respectively, we can 
factorize M as M = Mi ■ M2 ■ M3 , where M\ and M3 are diagonal matrices and M2 is given 
by 

(M 2 ) x , k , = e 2mx ' k 'l N (6) 

for < x',k' < s. In fact, ([6| is the matrix of the fractional Fourier transform [3], which 
can be evaluated in only O(slogs) operations. Furthermore, since both Mi and M3 are 
diagonal matrices, §5§ can be computed in O(slogs) steps as well. 
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Based on this observation, our algorithm takes the following form: 

1. Construct a decomposition for D = {(x, k), k < c x }. Starting from [0, N] 2 , we parti- 
tion the boxes recursively. A box fully inside the D is not further subdivided. The 
union of the boxes in the final decomposition is equal to D. 

2. For s = 1, 2, 4, 8, • • ■ , N, visit all of the boxes of size s in the decomposition. Suppose 
B is one such box. Compute the summation associated with B 

J2 e 2^k/N fk 
k B <k<k B +s 

for x B < x < x B + s using the fractional Fourier transform, and add the result to 
{u x ,x B < x < x B + s}. 

The first step of our algorithm clearly takes at most 0(N log N) steps. To estimate the 
complexity for the second step, one needs to have a bound on the number of boxes of size 
s. Based on the construction of the decomposition, we know that the center of a box B 
of size s is at most of distance s away from the curve {(x, c x )} because otherwise B would 
have been partitioned further. As a result, the centers of all of the boxes of size s must 
fall within a band along {(x,c x )} of width 2s. Noticing that c (t) is smooth, the length 
of {(x,c x )} is at most N. Therefore, the area of the band is at most O(Ns) and there 
are at most 0(Ns/s 2 ) = 0(N/s) boxes of size s. Since we spend O(slogs) operations in 
the fractional Fourier transform for each box of size s, the number of steps for a fixed s 
is 0{N/s ■ slogs) = O(iVlogs) = 0(N log N). Finally, summing over all logiV possible 
values of s, we conclude that our algorithm is 0(N log 2 N). As no approximation has been 
made, our algorithm is exact. 

2.2 Numerical results 

We apply our algorithm to several test examples to illustrate its properties. All of the 
results presented here are obtained on a desktop computer with a 2.8GHz CPU. For each 
example, we use the following notations. T a is the running time of our algorithm in seconds, 
Rd/a is the ratio of the running time of direct evaluation to T a , and R a /f is the ratio of T a 
over the running time of a Fourier transform (timed using FFTW |7 ) . As our algorithm is 
0(A^log 2 N), we expect Rd/ a to grow almost linearly and R a /f like logiV. 

Tables [TJ [2] and [3] summarize the results for three testing cases. The function in Table 
[3] corresponds to a 100 Hertz wave propagation through a slice of the Marmousi velocity 
model [11] taken at 2 km depth. From these examples, we observe clearly that Rd/ai the 
ratio between the running times of direct evaluation and our algorithm, indeed grows almost 
linearly in terms of N. Although the ratio R a /f has some fluctuations, its value grows very 
slowly with respect to N. 

3 Partial Fourier Transform in 2D 

A direct extension of the ID algorithm to the 2D case would partition the four dimensional 
summation domain D = {(x, k), \k\ < c x } with a similar 4D tree structure. However, this 
does not result in an algorithm with optimal complexity. To see this, let us count the number 
of boxes of size s in our tree structure. Repeating the argument used in the complexity 
analysis of the ID algorithm, we conclude that there are about A^ 3 s/s 4 = A r3 /s 3 boxes of 
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N 


T a (sec) 


Rd/a 


Ra/f 


1024 


5.00e-03 


3.00e+01 


8.57e+00 


2048 


1.00e-02 


6.40e+01 


9.27e+00 


4096 


2.25e-02 


1.14e+02 


1.09e+01 


8192 


5.50e-02 


1.86e+02 


1.36e+01 


16384 


1.20e-01 


3.41e+02 


9.23e+00 


32768 


2.30e-01 


7.12e+02 


4.15e+01 


65536 


4.60e-01 


1.42e+03 


3.98e+01 


131072 


9.20e-01 


2.49e+03 


3.68e+01 


262144 


1.89e+00 


4.16e+03 


3.48e+01 


524288 


3.84e+00 


8.19e+03 


8.30e+01 


1048576 


8.16e+00 


1.29e+04 


7.25e+01 



Table 1: Top-left: the curve c x and the decomposition of D. Top-right: running time of our 
algorithm as a function of N. Bottom: the results for different values of N. T a : the running 
time of our algorithm. R^/a'- the ratio between direct evaluation and our algorithm. R a /f : 
the ratio between our algorithm and one execution of FFT of size N. 

size s. Even though the computation associated with each box can be done in about s 2 log s 
steps, the total operation count for a fixed s is about N 3 /s 3 ■ s 2 logs = N 3 / slogs, which is 
much larger than the degree of freedom iV 2 for small values of s. 

3.1 Algorithm description 

Noticing that only \k\ appears in the constraint of the 2D partial Fourier transform 

u *= E e 2 ™- k ' N f k 0< Xl ,x 2 <N, 

\k\<c x 

we study a different set R = {(x, r),r < c x } instead. 

The algorithm first generates a decomposition for R. Similar to the ID case, we partition 
the box [0, iV] 3 through recursive subdivision. A box is not further subdivided if it fully 
resides in R. The union of all the boxes inside the decomposition is exactly the set R (see 
Figure |2j). 

The projection of any box of our decomposition onto the r coordinate is a dyadic interval. 
Let us consider one such interval A of size s and denote G A to be the set of all cubes that 
project onto A. We define K A to be the set {k, \ k\ G ^4} and X A to be the image of the 
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N 


T a (sec) 


Rd/a 


Ra/f 


1024 


6.25e-03 


2.40e+01 


1.07e+01 


2048 


1.38e-02 


4.36e+01 


1.27e+01 


4096 


3.25e-02 


7.88e+01 


1.54e+01 


8192 


7.50e-02 


1.37e+02 


1.86e+01 


16384 


1.70e-01 


2.41e+02 


1.09e+02 


32768 


3.10e-01 


5.29e+02 


9.92e+01 


65536 


6.90e-01 


8.90e+02 


5.89e+01 


131072 


1.42e+00 


1.85e+03 


5.83e+01 


262144 


3.03e+00 


2.81e+03 


5.64e+01 


524288 


6.42e+00 


4.90e+03 


1.17e+02 


1048576 


1.35e+01 


7.77e+03 


1.28e+02 



Table 2: Top-left: the curve c x and the decomposition of D. Top-right: running time of 
our algorithm as a function of N. Bottom: the results for different values of N. 

points in G A under the projection onto the {x\,x-z) plane. Since A is an interval of size s, 
K is in fact a band in the (fcijfo) domain with length O(N) and width s. Noticing that 
the surface c°(i) used to define {c x ,0 < x±,X2 < N} is smooth, the set Xa is also a band 
in the (x\,X2) domain with length O(N) and width O(s) (see Figure [3]). 
The part of the summation associated with the interval A is 

£ e 2mx ' k/N f k (7) 

k£K A 

for x G X . Since X A and K A are two bands in [0, N) 2 , ([7]) is indeed a Fourier transform 
problem with sparse data. To compute ([7]), we utilize the solution proposed in |12j . This 
approach is a butterfly algorithm based on [91 [10] and computes an approximation of ([7| in 
©(maxdX" 4 !, l-fC" 4 !) log(max(|X" 4 |, |AT^|)) operations, almost linear in terms of the degree 
of freedom. 

Combining these ideas, we have the following algorithm: 

1. Construct a decomposition for R = {(x,r),r < c x }. Starting from [0, N] 3 , we parti- 
tion the boxes recursively. A box fully inside the R is not further subdivided. The 
union of the boxes in the final decomposition is equal to R. 

2. For s = 1,2,4,8,- •• ,N, visit all the dyadic intervals of size s in the r coordinate. 



6 




N 


T a (sec) 


Rd/a 


Ra/f 


1024 


3.13e-03 


4.48e+01 


5.45e+01 


2048 


6.25e-03 


9.60e+01 


6.56e+01 


4096 


1.25e-02 


1.92e+02 


5.95e+00 


8192 


4.00e-02 


2.56e+02 


5.00e+01 


16384 


9.00e-02 


4.27e+02 


5.43e+00 


32768 


1.70e-01 


9.04e+02 


9.85e+00 


65536 


3.60e-01 


1.82e+03 


3.09e+01 


131072 


7.10e-01 


3.46e+03 


2.91e+01 


262144 


1.56e+00 


5.46e+03 


2.90e+01 


524288 


3.28e+00 


1.04e+04 


6.64e+01 


1048576 


6.66e+00 


1.57e+04 


6.50e+01 



Table 3: Top-left: the curve c x and the decomposition of D. Top-right: running time of 
our algorithm as a function of N. Bottom: the results for different values of N. 

Suppose that A is one such interval. Compute the summation associated with A 

e 2mx - k ' N f k 

k£K A 

for x G X A using the butterfly procedure in |12| . and add the result to {u x , x £ X A }. 

Let us consider now the complexity of this algorithm. The first step of our algorithm 
takes only 0(N 2 log N) steps. In order to estimate the number of operations used in the 
second step, let us consider a fixed s. For each interval A of size s, the number of steps 
used in 0(max(|X A |, \K A \) log(max(|X A |, \K A \)) = 0{\X A \ log \X A \) + 0{\K A \ log \K A \). 
Summing over all boxes of size s, we get 



X A \loz\X A \ +0 V li^llogl^l 





Noticing 5Za-|A|=s 1-^* I = Ea-IAI=s 1-^" I = * ne a bove quantity is clearly bounded by 
0(A^ 2 logA^). Finally after summing over all different values of s, we have a total complexity 
of order 0{N 2 log 2 N). 
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Figure 2: Left: c x is a Gaussian function. Right: a cross section view of the multiscale 
decomposition of R (the domain below the surface c x . Here N = 32. 




Figure 3: Left: the boxes inside the set G A for an given interval A of the r coordinate. 
Middle: the set K A in the (k\,hi) plane. Right: the set X A in the (xi,X2) plane. The 
butterfly algorithm in [12] evaluates the summation between X A and K A in almost linear 
time. Here TV = 32. 

3.2 Numerical results 

We apply our algorithm to several examples in this section. In |12| . equivalent charges 
located at Cartesian grids are used as the low rank representations in the butterfly algorithm 
to control the accuracy of the method. The size of the Cartesian grid p controls the accuracy 
of our algorithm. Here, we pick p to be 5 or 9. To quantify the error, we select a set 
S C {0, 1, • • • , N — l} 2 of size 100 and estimate the error by 

V EzesM 2 

where {u x } are the exact results and {u^} are our approximations. 

Similar to the ID case, the following notations are used: T a is the running time of our 
algorithm in seconds, R^/a 1S the ratio of the running time of direct evaluation to T a , R a /f 
is the ratio of T a over the running time of a Fourier transform (timed using FFTW [7]), 
and finally E a is the estimated error. 

The numerical results are summarized in Tables |4j [5] and [6j The function in Table [6] 
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N 2 



(N,p) 


T a (sec) 


Rd/a 


Ra/f 


E a 


(128,5) 


8.50e-01 


4.82e+01 


9.89e+02 


5.25e-04 


(256,5) 


5.21e+00 


1.26e+02 


1.59e+03 


9.85e-04 


(512,5) 


3.17e+01 


3.76e+02 


1.56e+03 


7.40e-04 


(1024,5) 


1.90e+02 


1.13e+03 


1.61e+03 


1.21e-03 


(2048,5) 


9.60e+02 


4.94e+03 


1.31e+03 


1.20e-03 


(128,9) 


6.80e-01 


6.02e+01 


7.25e+02 


1.46e-14 


(256,9) 


5.33e+00 


1.29e+02 


1.42e+03 


8.87e-09 


(512,9) 


4.18e+01 


2.98e+02 


1.63e+03 


2.17e-08 


(1024,9) 


2.90e+02 


7.49e+02 


1.97e+03 


9.34e-09 


(2048,9) 


1.62e+03 


2.92e+03 


2.10e+03 


2.39e-08 



Table 4: Top-left: c x when N = 128. Top-right: running time of our algorithm as a 
function of N 2 . Bottom: the results for different values of N. T a : the running time of our 
algorithm. R^u: the ratio between direct evaluation and our algorithm. R a /f : the ratio 
between our algorithm and one execution of FFT of size N. E a : the estimated error. 

corresponds to a 50 Hertz wave propagation through a slice of the SEG/EAGE velocity 
model [1] taken at 1.5 km depth. From these numbers, we see that our implementation 
indeed has a complexity almost linear in terms of the number of grid points. Due to the 
complex structure of the butterfly procedure, the constant of our algorithm is quite large 
compared to the one of FFTW. 

4 Conclusions and Discussions 

In this paper, we introduced two efficient algorithms for computing partial Fourier trans- 
forms in one and two dimensions. In both cases, we start by decomposing the appropriate 
summation domain in a multiscale way into simple pieces and apply existing fast algorithms 
on each piece to get optimal efficiency. In ID, the fractional Fourier transform is used. In 
2D, we resort to the butterfly algorithm for sparse Fourier transform proposed in [12]. As a 
result, both of our algorithms are asymptotically only 0(log N) times more expensive than 
the FFT. 

In Tables [4] [5] and [6j we notice that our 2D algorithm has a relatively large constant. 
One obvious direction of future research is to improve on our current implementation of 
the butterfly algorithm. Another alternative is to seek different ways for computing the 
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(N,p) 


T a (sec) 


Rd/a 


Ra/f 


E a 


(128,5) 


1.31e+00 


3.13e+01 


1.52e+03 


7.11e-04 


(256,5) 


1.13e+01 


6.35e+01 


2.91e+03 


5.17e-04 


(512,5) 


7.53e+01 


1.65e+02 


2.94e+03 


1.27e-03 


(1024,5) 


4.16e+02 


5.45e+02 


2.62e+03 


8.32e-04 


(2048,5) 


2.10e+03 


2.28e+03 


2.71e+03 


8.88e-04 


(128,9) 


9.50e-01 


4.31e+01 


1.01e+03 


2.02e-14 


(256,9) 


1.20e+01 


5.75e+01 


3.06e+03 


6.73e-09 


(512,9) 


1.08e+02 


1.17e+02 


4.20e+03 


9.35e-09 


(1024,9) 


6.51e+02 


3.46e+02 


4.10e+03 


1.32e-08 


(2048,9) 


3.21e+03 


1.48e+03 


4.12e+03 


2.00e-08 



Table 5: Top-left: c x when N = 128. Top-right: running time of our algorithm as a 
function of N 2 . Bottom: the results for different values of N. 

Fourier transforms with sparse data. In the past several years, several algorithms have been 
developed to address similar oscillatory behavior efficiently (see, for example, [21 016]). It 
would be interesting to see whether these ideas can be used in the setting of the Fourier 
transform with sparse data. 

As we mentioned earlier, this research is motivated mostly by the wave extrapolation 
algorithm in seismic imaging. Our model problem considers only one of the challenges, 
i.e., the existence of the summation constraint. The other challenge is to improve the 
evaluation of the e 2m V uj2 / v2 ( x )~ k2 ' z term, for example, by approximating it on each of the 
simple summation components. Research along this direction will be presented in a future 
report. 

Acknowledgments. The first author is partially supported by an Alfred P. Sloan 
Fellowship and the startup grant from the University of Texas at Austin. 
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